A Novel Gaussian Based Similarity Measure for Clustering Customer Transactions Using Transaction Sequence Vector

نویسندگان

  • M. S. B. Phridvi Raj
  • Vangipuram Radhakrishna
  • C. V. Guru Rao
چکیده

Abstract. Clustering Transactions in sequence, temporal and time series databases is achieving an important attention from the database researchers and software industry. Significant research is carried out towards defining and validating the suitability of new similarity measures for sequence, temporal, time series databases which can accurately and efficiently find the similarity between user transactions in the given database to predict the user behavior. The distribution of items present in the transactions contributes to a great extent in finding the degree of similarity between them. This forms the key idea of the proposed similarity measure. The main objective of the research is to first design the efficient similarity measure which essentially considers the distribution of the items in the item set over the entire transaction data set and also considers the commonality of items present in the transactions, which is the major drawback in the Jaccard, Cosine, Euclidean similarity measures. We then carry out the analysis for worst case, the average case and best case situations. The Similarity measure designed is Gaussian based and preserves the properties of Gaussian function. The proposed similarity measure may be used to both cluster and classify the user transactions and predict the user behaviors.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Empirical Comparison of Distance Measures for Multivariate Time Series Clustering

Multivariate time series (MTS) data are ubiquitous in science and daily life, and how to measure their similarity is a core part of MTS analyzing process. Many of the research efforts in this context have focused on proposing novel similarity measures for the underlying data. However, with the countless techniques to estimate similarity between MTS, this field suffers from a lack of comparative...

متن کامل

Similarity of Transactions for Customer Segmentation

Customer segmentation is usually the first step towards customer analysis and helps to make strategic plans for a company. Similarity between customers plays a key role in customer segmentation, and is usually evaluated by distance measures. While various distance measures have been proposed in data mining literature, the desirable distance measures for various data sources and given applicatio...

متن کامل

CUSTOMER CLUSTERING BASED ON FACTORS OF CUSTOMER LIFETIME VALUE WITH DATA MINING TECHNIQUE

Organizations have used Customer Lifetime Value (CLV) as an appropriate pattern to classify their customers. Data mining techniques have enabled organizations to analyze their customers’ behaviors more quantitatively. This research has been carried out to cluster customers based on factors of CLV model including length, recency, frequency, and monetary (LRFM) through data mining. Based on LRFM,...

متن کامل

An Ant Colony Clustering Algorithm Using Fuzzy Logic

The performance of Data partitioning using machine learning techniques is calculated only with distance measures i.e similarity between the transactions is carried out with the help of distance measurement algorithms such as Euclidian distance measure and cosine distance measure. The distance with connectivity (DWC) model is used to estimate distance between transactions with local consistency ...

متن کامل

Pattern-Oriented Clustering of Web Transactions

We propose a method for clustering web transaction data based on the idea that patterns generated within a cluster are similar to each other and different from patterns generated from other clusters. To do this, we define the difference between clusters and the similarity of transactions within a cluster using the notion of itemsets. A preliminary experiment on user-centric web browsing data de...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1604.05274  شماره 

صفحات  -

تاریخ انتشار 2015